Data storage and retrieval with Python¶
In this notebook we will see how to connect to a SQL database from Python. We will use the following libraries:
pymysql library to connect to a MySQL database
SQLAlchemy library to create Python objects from the data fetched from the database.
SQL Alchemy¶
SQL Alchemy is a library that allows you to create Python objects from the data fetched from your database. It is a very powerful library that allows you to create complex queries and to connect to different types of databases. In this notebook we will see how to use it to connect to a MySQL database.
Install SQL Alchemy¶
To install SQL in your system, use pip:
pip install sqlalchemy
Since we will use it to connect to a MySQL database, we also need to install the library to connect to MySQL databases:
pip install pymysql
Create an engine with SQL Alchemy¶
First, we need to create a connection to our database, using the create_engine function of SQLAlchemy (read the docs). We need to provide the connection details to be able to access the database.
The syntax to connect to a MySQL engine with pyMySQL is:
mysql+pymysql://<username>:<password>@<host>[:port]/<dbname>[?<options>]
is the user name to connect to the database (root in the example)
is the password of the user name provided (your_password in the example)
is the name of the host you need to connect to (localhost in the example)
[:] is an optional parameter you need to add if your database engine is running in a different port (other than the default 3306
is the name of the database you want to connect to (you need to set up an engine per database)
[?] are additional options you can pass to the database connection
For instance:
[5]:
from sqlalchemy import create_engine
# Create your engine.
engine = create_engine("mysql+pymysql://root:my_password@localhost:3307/car_retail")
will create a connection to a MySQL database running in the same machine, with the user root and the password your_password. The database used is car_retail, one of the example databases provided here. If you want to connect to a database running in a different machine, you need to provide the IP address of the machine instead of localhost. If your database is running in a different port than the default 3306, you need to provide the port number as well. For instance, if your database is running in a different machine with IP address
Execute queries¶
Once you have created to your engine, you can execute queries to work with databases. We will primarily use the method execute of the connection object to fetch data from the database. The method execute will return a result object that you can use to fetch the data as shown in the examples below.
Fetch data with SELECT queries¶
If you want to fetch data, you can just insert the query as an argument of the execute method. The method will return a result object, which you can iterate to fetch the data. Each row of the result object is a dictionary with the column names as keys and the values of the row as values. For instance, if you want to fetch all the cars from the database, you can use the following code:
[8]:
cars = engine.execute("SELECT * FROM Cars WHERE EmployeeId = 1")
for car in cars:
print(f"Car ID is {car['Id']} and model is {car['Model']}")
Car ID is 3 and model is Ford Mustang
Car ID is 5 and model is Ford Mustang
Insert data with INSERT queries¶
If you want to insert data, you can use the same method execute, but you need to provide the query as an argument. For instance, if you want to insert a new car in the database, you can use the following code:
[11]:
new_car = {
"CustomerId": 1,
"Model": "Ferrari",
"EmployeeId": 1,
"Status": "READY",
"TotalCost": 1000000
}
res = engine.execute(f"INSERT INTO Cars (CustomerId, Model, EmployeeId, Status, TotalCost) VALUES ({new_car['CustomerId']}, \"{new_car['Model']}\", {new_car['EmployeeId']}, \"{new_car['Status']}\", {new_car['TotalCost']})")
print(f"Inserted {res.rowcount} rows")
Inserted 1 rows
Note that you need to provide the values of the query as a string, so the example uses a formatted string to insert the values of the dictionary new_car in the query. Note also that the values of the strings need to enclosed in quotation marks. You can use escaped double quotes to enclose the strings, as shown in the example, or use simple quotes which do not need to be escaped.
Note also that the method execute returns a result object that contains the number of rows affected by the query. In this case, the result object is res, and the number of rows affected is stored in the attribute rowcount.
SQL Magic and Pandas¶
Another alternative is to use Jupyter Notebooks, Pandas dataframes SQL magic. To enable the SQL magic first you need to install some libraries:
[ ]:
%pip install sqlalchemy
%pip install ipython-sql
%pip install pymysql
SQLAlchemy is used to create Python objects from the data fetched from your database. iPython-SQL installs the magic function, and pymysql is the library used to provide support to connect to SQL databases.
Create an engine with SQL Alchemy¶
First, we need to create a connection to our database, using the create_engine function of SQLAlchemy (read the docs). We need to provide the connection details to be able to access the database:
The syntax to connect to a MySQL engine with pyMySQL is:
mysql+pymysql://<username>:<password>@<host>[:port]/<dbname>[?<options>]
is the user name to connect to the database (root in the example)
is the password of the user name provided (your_password in the example)
is the name of the host you need to connect to (localhost in the example)
[:] is an optional parameter you need to add if your database engine is running in a different port (other than the default 3306
is the name of the database you want to connect to (you need to set up an engine per database)
[?] are additional options you can pass to the database connection
Once you have created to your engine, you can create a connection and fetch data. The main methods you can use with Pandas are
read_sql_table¶
Read SQL allows to fetch all the data from a SQL table and load it to a dataframe. You only need to open a connection with an engine, and use the name of the table as an argument:
[ ]:
import pandas as pd
with engine.connect() as conn, conn.begin():
cars_pd = pd.read_sql_table("cars", conn) #there is a table named cars in the database
print(cars_pd.head())
read_sql_query¶
Similar to the function below, you can use the function read_sql_query to fetch the data using a SQL query:
[ ]:
with engine.connect() as conn, conn.begin():
cars_pd = pd.read_sql_query("SELECT Id, CustomerId, Model, TotalCost, Status FROM Cars WHERE EmployeeId = 1", conn) #there is a table named cars in the database
print(cars_pd.head())
SQL Magic Syntax¶
You are now probably thinking, cool, but where is the magic? Well, the SQL magic is a modifier that will allow us to simplify the syntax used above. To use it, first we need to load it with:
[ ]:
%load_ext sql
Once the magic is loaded, we can connect to a database simply by using the SQL magic prefix $sql and the connection string using the same syntax above:
[ ]:
%sql mysql+pymysql://root:localhost@localhost/car_retail
Now, you can fetch data from the database just writing the magic before the query. For instance, to save the result from the query above in a variable we can just use:
[ ]:
result = %sql SELECT Id, CustomerId, Model, TotalCost, Status FROM Cars WHERE EmployeeId = 1
result
We can then convert the result to a dataframe using the method Dataframe of the result object:
[ ]:
cars_pd = result.DataFrame()
cars_pd.head()
Multiple lines syntax¶
Sometimes we want to format our SQL query to span several lines of codes for the sake of readability. In this cases, we can use the multiline syntax using the << operator in the SQL magic, as:
[ ]:
%%sql result_2 <<
SELECT
Id,
CustomerId,
Model,
TotalCost,
Status
FROM Cars
WHERE EmployeeId = 1;
Now in another cell we can convert the result to a Pandas dataframe as
[ ]:
cars_pd = result_2.DataFrame()
cars_pd.head()